Overview

Dataset statistics

Number of variables25
Number of observations105542
Missing cells416
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory117.6 MiB
Average record size in memory1.1 KiB

Variable types

Numeric10
Categorical15

Alerts

prod_name has a high cardinality: 45875 distinct values High cardinality
product_type_name has a high cardinality: 131 distinct values High cardinality
department_name has a high cardinality: 250 distinct values High cardinality
section_name has a high cardinality: 56 distinct values High cardinality
detail_desc has a high cardinality: 43404 distinct values High cardinality
article_id is highly correlated with product_codeHigh correlation
product_code is highly correlated with article_idHigh correlation
department_no is highly correlated with index_group_noHigh correlation
index_group_no is highly correlated with department_noHigh correlation
article_id is highly correlated with product_codeHigh correlation
product_code is highly correlated with article_idHigh correlation
article_id is highly correlated with product_codeHigh correlation
product_code is highly correlated with article_idHigh correlation
department_no is highly correlated with index_group_noHigh correlation
index_group_no is highly correlated with department_noHigh correlation
perceived_colour_value_name is highly correlated with perceived_colour_master_name and 1 other fieldsHigh correlation
index_group_no is highly correlated with index_name and 3 other fieldsHigh correlation
index_name is highly correlated with index_group_no and 3 other fieldsHigh correlation
perceived_colour_master_name is highly correlated with perceived_colour_value_name and 1 other fieldsHigh correlation
index_group_name is highly correlated with index_group_no and 3 other fieldsHigh correlation
section_name is highly correlated with index_group_no and 4 other fieldsHigh correlation
colour_group_name is highly correlated with perceived_colour_value_name and 1 other fieldsHigh correlation
product_group_name is highly correlated with garment_group_nameHigh correlation
index_code is highly correlated with index_group_no and 3 other fieldsHigh correlation
garment_group_name is highly correlated with section_name and 1 other fieldsHigh correlation
article_id is highly correlated with product_codeHigh correlation
product_code is highly correlated with article_idHigh correlation
product_type_no is highly correlated with product_group_name and 6 other fieldsHigh correlation
product_group_name is highly correlated with product_type_no and 8 other fieldsHigh correlation
graphical_appearance_no is highly correlated with graphical_appearance_name and 3 other fieldsHigh correlation
graphical_appearance_name is highly correlated with product_group_name and 10 other fieldsHigh correlation
colour_group_code is highly correlated with colour_group_name and 2 other fieldsHigh correlation
colour_group_name is highly correlated with graphical_appearance_no and 10 other fieldsHigh correlation
perceived_colour_value_id is highly correlated with graphical_appearance_no and 5 other fieldsHigh correlation
perceived_colour_value_name is highly correlated with graphical_appearance_no and 5 other fieldsHigh correlation
perceived_colour_master_id is highly correlated with colour_group_code and 5 other fieldsHigh correlation
perceived_colour_master_name is highly correlated with colour_group_code and 7 other fieldsHigh correlation
department_no is highly correlated with product_type_no and 9 other fieldsHigh correlation
index_code is highly correlated with product_type_no and 12 other fieldsHigh correlation
index_name is highly correlated with product_type_no and 12 other fieldsHigh correlation
index_group_no is highly correlated with department_no and 6 other fieldsHigh correlation
index_group_name is highly correlated with department_no and 6 other fieldsHigh correlation
section_no is highly correlated with product_group_name and 9 other fieldsHigh correlation
section_name is highly correlated with product_type_no and 13 other fieldsHigh correlation
garment_group_no is highly correlated with product_type_no and 8 other fieldsHigh correlation
garment_group_name is highly correlated with product_type_no and 11 other fieldsHigh correlation
graphical_appearance_no is highly skewed (γ1 = -45.01901161) Skewed
article_id has unique values Unique

Reproduction

Analysis started2022-05-03 22:42:52.333333
Analysis finished2022-05-03 22:43:18.114324
Duration25.78 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

article_id
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct105542
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean698424569.1
Minimum108775015
Maximum959461001
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size824.7 KiB
2022-05-03T22:43:18.180213image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum108775015
5-th percentile493810022.1
Q1616992501
median702213001.5
Q3796703001.8
95-th percentile889379005.9
Maximum959461001
Range850685986
Interquartile range (IQR)179710500.8

Descriptive statistics

Standard deviation128462381.3
Coefficient of variation (CV)0.1839316471
Kurtosis0.6609757612
Mean698424569.1
Median Absolute Deviation (MAD)90074996.5
Skewness-0.5772833542
Sum7.371312587 × 1013
Variance1.650258342 × 1016
MonotonicityStrictly increasing
2022-05-03T22:43:18.295726image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1087750151
 
< 0.1%
7601580011
 
< 0.1%
7602140021
 
< 0.1%
7602080011
 
< 0.1%
7601950061
 
< 0.1%
7601950051
 
< 0.1%
7601950041
 
< 0.1%
7601950031
 
< 0.1%
7601950021
 
< 0.1%
7601950011
 
< 0.1%
Other values (105532)105532
> 99.9%
ValueCountFrequency (%)
1087750151
< 0.1%
1087750441
< 0.1%
1087750511
< 0.1%
1100650011
< 0.1%
1100650021
< 0.1%
1100650111
< 0.1%
1115650011
< 0.1%
1115650031
< 0.1%
1115860011
< 0.1%
1115930011
< 0.1%
ValueCountFrequency (%)
9594610011
< 0.1%
9573750011
< 0.1%
9562170021
< 0.1%
9537630011
< 0.1%
9534500011
< 0.1%
9529380011
< 0.1%
9529370031
< 0.1%
9522670011
< 0.1%
9504490021
< 0.1%
9495940011
< 0.1%

product_code
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct47224
Distinct (%)44.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean698424.5634
Minimum108775
Maximum959461
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size824.7 KiB
2022-05-03T22:43:18.418829image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum108775
5-th percentile493810
Q1616992.5
median702213
Q3796703
95-th percentile889379
Maximum959461
Range850686
Interquartile range (IQR)179710.5

Descriptive statistics

Standard deviation128462.3844
Coefficient of variation (CV)0.183931653
Kurtosis0.6609758731
Mean698424.5634
Median Absolute Deviation (MAD)90075
Skewness-0.5772833945
Sum7.371312527 × 1010
Variance1.650258421 × 1010
MonotonicityIncreasing
2022-05-03T22:43:18.536168image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
78370775
 
0.1%
68402170
 
0.1%
69992352
 
< 0.1%
69975549
 
< 0.1%
68560446
 
< 0.1%
73965944
 
< 0.1%
66407441
 
< 0.1%
57000241
 
< 0.1%
56224541
 
< 0.1%
68581641
 
< 0.1%
Other values (47214)105042
99.5%
ValueCountFrequency (%)
1087753
< 0.1%
1100653
< 0.1%
1115652
 
< 0.1%
1115861
 
< 0.1%
1115931
 
< 0.1%
1116091
 
< 0.1%
1126792
 
< 0.1%
1144282
 
< 0.1%
1163791
 
< 0.1%
1184587
< 0.1%
ValueCountFrequency (%)
9594611
< 0.1%
9573751
< 0.1%
9562171
< 0.1%
9537631
< 0.1%
9534501
< 0.1%
9529381
< 0.1%
9529371
< 0.1%
9522671
< 0.1%
9504491
< 0.1%
9495941
< 0.1%

prod_name
Categorical

HIGH CARDINALITY

Distinct45875
Distinct (%)43.5%
Missing0
Missing (%)0.0%
Memory size7.3 MiB
Dragonfly dress
 
98
Mike tee
 
72
Wow printed tee 6.99
 
70
1pk Fun
 
55
TP Paddington Sweater
 
54
Other values (45870)
105193 

Length

Max length30
Median length23
Mean length15.53556878
Min length1

Characters and Unicode

Total characters1639655
Distinct characters91
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22920 ?
Unique (%)21.7%

Sample

1st rowStrap top
2nd rowStrap top
3rd rowStrap top (1)
4th rowOP T-shirt (Idro)
5th rowOP T-shirt (Idro)

Common Values

ValueCountFrequency (%)
Dragonfly dress98
 
0.1%
Mike tee72
 
0.1%
Wow printed tee 6.9970
 
0.1%
1pk Fun55
 
0.1%
TP Paddington Sweater54
 
0.1%
Pria tee51
 
< 0.1%
Despacito48
 
< 0.1%
MY44
 
< 0.1%
Robin 3pk Fancy43
 
< 0.1%
DANTE set42
 
< 0.1%
Other values (45865)104965
99.5%

Length

2022-05-03T22:43:18.662325image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
dress7825
 
2.6%
tee4553
 
1.5%
top3938
 
1.3%
shorts3555
 
1.2%
fancy2796
 
0.9%
ls2336
 
0.8%
hood2294
 
0.8%
sb2252
 
0.8%
set2133
 
0.7%
12043
 
0.7%
Other values (13649)261891
88.6%

Most occurring characters

ValueCountFrequency (%)
190600
 
11.6%
e116144
 
7.1%
a94570
 
5.8%
s79849
 
4.9%
r78145
 
4.8%
i76131
 
4.6%
o67798
 
4.1%
n65393
 
4.0%
t63950
 
3.9%
l58420
 
3.6%
Other values (81)748655
45.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter988791
60.3%
Uppercase Letter418961
25.6%
Space Separator190600
 
11.6%
Decimal Number18512
 
1.1%
Dash Punctuation7701
 
0.5%
Other Punctuation6650
 
0.4%
Open Punctuation3937
 
0.2%
Close Punctuation3914
 
0.2%
Math Symbol537
 
< 0.1%
Connector Punctuation30
 
< 0.1%
Other values (2)22
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e116144
11.7%
a94570
 
9.6%
s79849
 
8.1%
r78145
 
7.9%
i76131
 
7.7%
o67798
 
6.9%
n65393
 
6.6%
t63950
 
6.5%
l58420
 
5.9%
d33369
 
3.4%
Other values (23)255022
25.8%
Uppercase Letter
ValueCountFrequency (%)
S45068
 
10.8%
E32036
 
7.6%
A27062
 
6.5%
T26995
 
6.4%
L25542
 
6.1%
P24833
 
5.9%
B23677
 
5.7%
C22072
 
5.3%
R21928
 
5.2%
I19528
 
4.7%
Other values (23)150220
35.9%
Decimal Number
ValueCountFrequency (%)
25117
27.6%
13776
20.4%
32769
15.0%
52213
12.0%
92071
11.2%
7853
 
4.6%
4516
 
2.8%
0465
 
2.5%
8437
 
2.4%
6295
 
1.6%
Other Punctuation
ValueCountFrequency (%)
.3215
48.3%
/2881
43.3%
&273
 
4.1%
:181
 
2.7%
!44
 
0.7%
'40
 
0.6%
?16
 
0.2%
Space Separator
ValueCountFrequency (%)
190600
100.0%
Dash Punctuation
ValueCountFrequency (%)
-7701
100.0%
Open Punctuation
ValueCountFrequency (%)
(3937
100.0%
Close Punctuation
ValueCountFrequency (%)
)3914
100.0%
Math Symbol
ValueCountFrequency (%)
+537
100.0%
Connector Punctuation
ValueCountFrequency (%)
_30
100.0%
Modifier Symbol
ValueCountFrequency (%)
^21
100.0%
Other Symbol
ValueCountFrequency (%)
©1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1407752
85.9%
Common231903
 
14.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e116144
 
8.3%
a94570
 
6.7%
s79849
 
5.7%
r78145
 
5.6%
i76131
 
5.4%
o67798
 
4.8%
n65393
 
4.6%
t63950
 
4.5%
l58420
 
4.1%
S45068
 
3.2%
Other values (56)662284
47.0%
Common
ValueCountFrequency (%)
190600
82.2%
-7701
 
3.3%
25117
 
2.2%
(3937
 
1.7%
)3914
 
1.7%
13776
 
1.6%
.3215
 
1.4%
/2881
 
1.2%
32769
 
1.2%
52213
 
1.0%
Other values (15)5780
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1639490
> 99.9%
None165
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
190600
 
11.6%
e116144
 
7.1%
a94570
 
5.8%
s79849
 
4.9%
r78145
 
4.8%
i76131
 
4.6%
o67798
 
4.1%
n65393
 
4.0%
t63950
 
3.9%
l58420
 
3.6%
Other values (66)748490
45.7%
None
ValueCountFrequency (%)
ö41
24.8%
é35
21.2%
É16
 
9.7%
Ö13
 
7.9%
å12
 
7.3%
ä9
 
5.5%
è9
 
5.5%
í7
 
4.2%
Ä7
 
4.2%
È6
 
3.6%
Other values (5)10
 
6.1%

product_type_no
Real number (ℝ)

HIGH CORRELATION

Distinct132
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean234.8618749
Minimum-1
Maximum762
Zeros0
Zeros (%)0.0%
Negative121
Negative (%)0.1%
Memory size824.7 KiB
2022-05-03T22:43:18.790919image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile70
Q1252
median259
Q3272
95-th percentile304
Maximum762
Range763
Interquartile range (IQR)20

Descriptive statistics

Standard deviation75.04930803
Coefficient of variation (CV)0.3195465763
Kurtosis1.165582206
Mean234.8618749
Median Absolute Deviation (MAD)13
Skewness-1.423031301
Sum24787792
Variance5632.398636
MonotonicityNot monotonic
2022-05-03T22:43:18.913969image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
27211169
 
10.6%
26510362
 
9.8%
2529302
 
8.8%
2557904
 
7.5%
2544155
 
3.9%
2583979
 
3.8%
2623940
 
3.7%
2743939
 
3.7%
2593405
 
3.2%
2532991
 
2.8%
Other values (122)44396
42.1%
ValueCountFrequency (%)
-1121
 
0.1%
4948
 
< 0.1%
57662
0.6%
591307
1.2%
6050
 
< 0.1%
661280
1.2%
67458
 
0.4%
68180
 
0.2%
69573
0.5%
701159
1.1%
ValueCountFrequency (%)
7623
 
< 0.1%
7615
 
< 0.1%
5323
 
< 0.1%
5294
 
< 0.1%
5251
 
< 0.1%
5232
 
< 0.1%
5217
 
< 0.1%
5156
 
< 0.1%
5141
 
< 0.1%
51224
< 0.1%

product_type_name
Categorical

HIGH CARDINALITY

Distinct131
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size6.5 MiB
Trousers
11169 
Dress
10362 
Sweater
9302 
T-shirt
7904 
Top
 
4155
Other values (126)
62650 

Length

Max length24
Median length19
Mean length7.530878702
Min length3

Characters and Unicode

Total characters794824
Distinct characters51
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)< 0.1%

Sample

1st rowVest top
2nd rowVest top
3rd rowVest top
4th rowBra
5th rowBra

Common Values

ValueCountFrequency (%)
Trousers11169
 
10.6%
Dress10362
 
9.8%
Sweater9302
 
8.8%
T-shirt7904
 
7.5%
Top4155
 
3.9%
Blouse3979
 
3.8%
Jacket3940
 
3.7%
Shorts3939
 
3.7%
Shirt3405
 
3.2%
Vest top2991
 
2.8%
Other values (121)44396
42.1%

Length

2022-05-03T22:43:19.039115image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
trousers11299
 
9.2%
dress10362
 
8.5%
sweater9302
 
7.6%
top8142
 
6.7%
t-shirt7904
 
6.5%
bottom4275
 
3.5%
blouse3979
 
3.3%
jacket3940
 
3.2%
shorts3939
 
3.2%
shirt3854
 
3.1%
Other values (140)55357
45.2%

Most occurring characters

ValueCountFrequency (%)
e87702
 
11.0%
r86934
 
10.9%
s86754
 
10.9%
t65600
 
8.3%
o51144
 
6.4%
a50695
 
6.4%
i40748
 
5.1%
S29213
 
3.7%
T25917
 
3.3%
u23025
 
2.9%
Other values (41)247092
31.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter652636
82.1%
Uppercase Letter110870
 
13.9%
Space Separator16811
 
2.1%
Dash Punctuation7910
 
1.0%
Other Punctuation6597
 
0.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e87702
13.4%
r86934
13.3%
s86754
13.3%
t65600
10.1%
o51144
7.8%
a50695
7.8%
i40748
 
6.2%
u23025
 
3.5%
h20431
 
3.1%
n17817
 
2.7%
Other values (15)121786
18.7%
Uppercase Letter
ValueCountFrequency (%)
S29213
26.3%
T25917
23.4%
B12500
11.3%
D10698
 
9.6%
H5688
 
5.1%
J5117
 
4.6%
U3788
 
3.4%
P3513
 
3.2%
V2991
 
2.7%
C2696
 
2.4%
Other values (12)8749
 
7.9%
Other Punctuation
ValueCountFrequency (%)
/6594
> 99.9%
.3
 
< 0.1%
Space Separator
ValueCountFrequency (%)
16811
100.0%
Dash Punctuation
ValueCountFrequency (%)
-7910
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin763506
96.1%
Common31318
 
3.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e87702
11.5%
r86934
11.4%
s86754
11.4%
t65600
 
8.6%
o51144
 
6.7%
a50695
 
6.6%
i40748
 
5.3%
S29213
 
3.8%
T25917
 
3.4%
u23025
 
3.0%
Other values (37)215774
28.3%
Common
ValueCountFrequency (%)
16811
53.7%
-7910
25.3%
/6594
 
21.1%
.3
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII794824
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e87702
 
11.0%
r86934
 
10.9%
s86754
 
10.9%
t65600
 
8.3%
o51144
 
6.4%
a50695
 
6.4%
i40748
 
5.1%
S29213
 
3.7%
T25917
 
3.3%
u23025
 
2.9%
Other values (41)247092
31.1%

product_group_name
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.3 MiB
Garment Upper body
42741 
Garment Lower body
19812 
Garment Full body
13292 
Accessories
11158 
Underwear
5490 
Other values (14)
13049 

Length

Max length21
Median length18
Mean length15.44063975
Min length3

Characters and Unicode

Total characters1629636
Distinct characters35
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGarment Upper body
2nd rowGarment Upper body
3rd rowGarment Upper body
4th rowUnderwear
5th rowUnderwear

Common Values

ValueCountFrequency (%)
Garment Upper body42741
40.5%
Garment Lower body19812
18.8%
Garment Full body13292
 
12.6%
Accessories11158
 
10.6%
Underwear5490
 
5.2%
Shoes5283
 
5.0%
Swimwear3127
 
3.0%
Socks & Tights2442
 
2.3%
Nightwear1899
 
1.8%
Unknown121
 
0.1%
Other values (9)177
 
0.2%

Length

2022-05-03T22:43:19.143400image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
garment75854
28.9%
body75845
28.9%
upper42741
16.3%
lower19812
 
7.6%
full13292
 
5.1%
accessories11158
 
4.3%
underwear5490
 
2.1%
shoes5283
 
2.0%
swimwear3127
 
1.2%
socks2442
 
0.9%
Other values (16)7102
 
2.7%

Most occurring characters

ValueCountFrequency (%)
e182285
 
11.2%
r165779
 
10.2%
156604
 
9.6%
o114727
 
7.0%
a86526
 
5.3%
p85482
 
5.2%
n81847
 
5.0%
d81398
 
5.0%
t80347
 
4.9%
m79047
 
4.9%
Other values (25)515594
31.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1286698
79.0%
Uppercase Letter183838
 
11.3%
Space Separator156604
 
9.6%
Other Punctuation2496
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e182285
14.2%
r165779
12.9%
o114727
8.9%
a86526
 
6.7%
p85482
 
6.6%
n81847
 
6.4%
d81398
 
6.3%
t80347
 
6.2%
m79047
 
6.1%
y75850
 
5.9%
Other values (11)253410
19.7%
Uppercase Letter
ValueCountFrequency (%)
G75854
41.3%
U48406
26.3%
L19812
 
10.8%
F13307
 
7.2%
A11158
 
6.1%
S10866
 
5.9%
T2442
 
1.3%
N1899
 
1.0%
C49
 
< 0.1%
B25
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
&2442
97.8%
/54
 
2.2%
Space Separator
ValueCountFrequency (%)
156604
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1470536
90.2%
Common159100
 
9.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e182285
12.4%
r165779
 
11.3%
o114727
 
7.8%
a86526
 
5.9%
p85482
 
5.8%
n81847
 
5.6%
d81398
 
5.5%
t80347
 
5.5%
m79047
 
5.4%
G75854
 
5.2%
Other values (22)437244
29.7%
Common
ValueCountFrequency (%)
156604
98.4%
&2442
 
1.5%
/54
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1629636
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e182285
 
11.2%
r165779
 
10.2%
156604
 
9.6%
o114727
 
7.0%
a86526
 
5.3%
p85482
 
5.2%
n81847
 
5.0%
d81398
 
5.0%
t80347
 
4.9%
m79047
 
4.9%
Other values (25)515594
31.6%

graphical_appearance_no
Real number (ℝ)

HIGH CORRELATION
SKEWED

Distinct30
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1009515.076
Minimum-1
Maximum1010029
Zeros0
Zeros (%)0.0%
Negative52
Negative (%)< 0.1%
Memory size824.7 KiB
2022-05-03T22:43:19.246046image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile1010001
Q11010008
median1010016
Q31010016
95-th percentile1010023
Maximum1010029
Range1010030
Interquartile range (IQR)8

Descriptive statistics

Standard deviation22413.58578
Coefficient of variation (CV)0.02220232894
Kurtosis2024.749956
Mean1009515.076
Median Absolute Deviation (MAD)1
Skewness-45.01901161
Sum1.065462401 × 1011
Variance502368827.4
MonotonicityNot monotonic
2022-05-03T22:43:19.347383image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=30)
ValueCountFrequency (%)
101001649747
47.1%
101000117165
 
16.3%
10100105938
 
5.6%
10100174990
 
4.7%
10100234842
 
4.6%
10100083215
 
3.0%
10100143098
 
2.9%
10100042178
 
2.1%
10100051830
 
1.7%
10100211513
 
1.4%
Other values (20)11026
 
10.4%
ValueCountFrequency (%)
-152
 
< 0.1%
101000117165
16.3%
10100021341
 
1.3%
101000315
 
< 0.1%
10100042178
 
2.1%
10100051830
 
1.7%
1010006681
 
0.6%
10100071165
 
1.1%
10100083215
 
3.0%
1010009958
 
0.9%
ValueCountFrequency (%)
10100298
 
< 0.1%
101002886
 
0.1%
101002766
 
0.1%
10100261502
 
1.4%
1010025153
 
0.1%
1010024322
 
0.3%
10100234842
4.6%
1010022830
 
0.8%
10100211513
 
1.4%
1010020376
 
0.4%

graphical_appearance_name
Categorical

HIGH CORRELATION

Distinct30
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.6 MiB
Solid
49747 
All over pattern
17165 
Melange
5938 
Stripe
4990 
Denim
 
4842
Other values (25)
22860 

Length

Max length19
Median length5
Mean length8.285857763
Min length3

Characters and Unicode

Total characters874506
Distinct characters42
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSolid
2nd rowSolid
3rd rowStripe
4th rowSolid
5th rowSolid

Common Values

ValueCountFrequency (%)
Solid49747
47.1%
All over pattern17165
 
16.3%
Melange5938
 
5.6%
Stripe4990
 
4.7%
Denim4842
 
4.6%
Front print3215
 
3.0%
Placement print3098
 
2.9%
Check2178
 
2.1%
Colour blocking1830
 
1.7%
Lace1513
 
1.4%
Other values (20)11026
 
10.4%

Length

2022-05-03T22:43:19.456240image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
solid49747
32.9%
pattern17680
 
11.7%
all17165
 
11.4%
over17165
 
11.4%
print6313
 
4.2%
melange5938
 
3.9%
stripe4990
 
3.3%
denim4842
 
3.2%
front3215
 
2.1%
placement3098
 
2.0%
Other values (25)21011
13.9%

Most occurring characters

ValueCountFrequency (%)
l102988
11.8%
o80380
 
9.2%
e77881
 
8.9%
i77859
 
8.9%
t67513
 
7.7%
r62943
 
7.2%
S55696
 
6.4%
d54006
 
6.2%
n48443
 
5.5%
45622
 
5.2%
Other values (32)201175
23.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter716271
81.9%
Uppercase Letter107841
 
12.3%
Space Separator45622
 
5.2%
Other Punctuation3431
 
0.4%
Decimal Number1341
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l102988
14.4%
o80380
11.2%
e77881
10.9%
i77859
10.9%
t67513
9.4%
r62943
8.8%
d54006
7.5%
n48443
6.8%
a35452
 
4.9%
p32949
 
4.6%
Other values (13)75857
10.6%
Uppercase Letter
ValueCountFrequency (%)
S55696
51.6%
A18521
 
17.2%
M8460
 
7.8%
D6864
 
6.4%
C4706
 
4.4%
F3215
 
3.0%
P3098
 
2.9%
O2017
 
1.9%
L1513
 
1.4%
E1165
 
1.1%
Other values (6)2586
 
2.4%
Space Separator
ValueCountFrequency (%)
45622
100.0%
Other Punctuation
ValueCountFrequency (%)
/3431
100.0%
Decimal Number
ValueCountFrequency (%)
31341
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin824112
94.2%
Common50394
 
5.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
l102988
12.5%
o80380
9.8%
e77881
9.5%
i77859
9.4%
t67513
8.2%
r62943
 
7.6%
S55696
 
6.8%
d54006
 
6.6%
n48443
 
5.9%
a35452
 
4.3%
Other values (29)160951
19.5%
Common
ValueCountFrequency (%)
45622
90.5%
/3431
 
6.8%
31341
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII874506
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l102988
11.8%
o80380
 
9.2%
e77881
 
8.9%
i77859
 
8.9%
t67513
 
7.7%
r62943
 
7.2%
S55696
 
6.4%
d54006
 
6.2%
n48443
 
5.5%
45622
 
5.2%
Other values (32)201175
23.0%

colour_group_code
Real number (ℝ)

HIGH CORRELATION

Distinct50
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.23382161
Minimum-1
Maximum93
Zeros0
Zeros (%)0.0%
Negative28
Negative (%)< 0.1%
Memory size824.7 KiB
2022-05-03T22:43:19.564598image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile7
Q19
median14
Q352
95-th percentile81
Maximum93
Range94
Interquartile range (IQR)43

Descriptive statistics

Standard deviation28.08615412
Coefficient of variation (CV)0.8713256053
Kurtosis-1.061047111
Mean32.23382161
Median Absolute Deviation (MAD)7
Skewness0.7138227017
Sum3402022
Variance788.8320534
MonotonicityNot monotonic
2022-05-03T22:43:19.676274image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
922670
21.5%
7312171
 
11.5%
109542
 
9.0%
515811
 
5.5%
74487
 
4.3%
123356
 
3.2%
723308
 
3.1%
423056
 
2.9%
713012
 
2.9%
192767
 
2.6%
Other values (40)35362
33.5%
ValueCountFrequency (%)
-128
 
< 0.1%
1105
 
0.1%
231
 
< 0.1%
3709
 
0.7%
494
 
0.1%
51377
 
1.3%
62105
 
2.0%
74487
 
4.3%
82731
 
2.6%
922670
21.5%
ValueCountFrequency (%)
932106
 
2.0%
92815
 
0.8%
91681
 
0.6%
90129
 
0.1%
83473
 
0.4%
82435
 
0.4%
811027
 
1.0%
8014
 
< 0.1%
7312171
11.5%
723308
 
3.1%

colour_group_name
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct50
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.5 MiB
Black
22670 
Dark Blue
12171 
White
9542 
Light Pink
5811 
Grey
 
4487
Other values (45)
50861 

Length

Max length15
Median length14
Mean length7.480510129
Min length3

Characters and Unicode

Total characters789508
Distinct characters38
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBlack
2nd rowWhite
3rd rowOff White
4th rowBlack
5th rowWhite

Common Values

ValueCountFrequency (%)
Black22670
21.5%
Dark Blue12171
 
11.5%
White9542
 
9.0%
Light Pink5811
 
5.5%
Grey4487
 
4.3%
Light Beige3356
 
3.2%
Blue3308
 
3.1%
Red3056
 
2.9%
Light Blue3012
 
2.9%
Greenish Khaki2767
 
2.6%
Other values (40)35362
33.5%

Length

2022-05-03T22:43:19.784709image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
dark23498
15.0%
black22670
14.4%
light19334
12.3%
blue18542
11.8%
white12268
7.8%
pink9442
 
6.0%
grey9323
 
5.9%
beige7378
 
4.7%
red5795
 
3.7%
green3731
 
2.4%
Other values (16)25065
16.0%

Most occurring characters

ValueCountFrequency (%)
e87703
 
11.1%
k58405
 
7.4%
i58311
 
7.4%
l54192
 
6.9%
a52335
 
6.6%
51504
 
6.5%
B50155
 
6.4%
r49945
 
6.3%
h40420
 
5.1%
t33220
 
4.2%
Other values (28)253318
32.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter580770
73.6%
Uppercase Letter157140
 
19.9%
Space Separator51504
 
6.5%
Other Punctuation94
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e87703
15.1%
k58405
10.1%
i58311
10.0%
l54192
9.3%
a52335
9.0%
r49945
8.6%
h40420
7.0%
t33220
 
5.7%
g30050
 
5.2%
u23536
 
4.1%
Other values (12)92653
16.0%
Uppercase Letter
ValueCountFrequency (%)
B50155
31.9%
D23498
15.0%
L19334
 
12.3%
G17424
 
11.1%
W12268
 
7.8%
P10538
 
6.7%
O7651
 
4.9%
R5795
 
3.7%
Y4899
 
3.1%
K2767
 
1.8%
Other values (4)2811
 
1.8%
Space Separator
ValueCountFrequency (%)
51504
100.0%
Other Punctuation
ValueCountFrequency (%)
/94
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin737910
93.5%
Common51598
 
6.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e87703
 
11.9%
k58405
 
7.9%
i58311
 
7.9%
l54192
 
7.3%
a52335
 
7.1%
B50155
 
6.8%
r49945
 
6.8%
h40420
 
5.5%
t33220
 
4.5%
g30050
 
4.1%
Other values (26)223174
30.2%
Common
ValueCountFrequency (%)
51504
99.8%
/94
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII789508
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e87703
 
11.1%
k58405
 
7.4%
i58311
 
7.4%
l54192
 
6.9%
a52335
 
6.6%
51504
 
6.5%
B50155
 
6.4%
r49945
 
6.3%
h40420
 
5.1%
t33220
 
4.2%
Other values (28)253318
32.1%

perceived_colour_value_id
Real number (ℝ)

HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.20618332
Minimum-1
Maximum7
Zeros0
Zeros (%)0.0%
Negative28
Negative (%)< 0.1%
Memory size824.7 KiB
2022-05-03T22:43:19.871250image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile1
Q12
median4
Q34
95-th percentile7
Maximum7
Range8
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.563838925
Coefficient of variation (CV)0.487757177
Kurtosis-0.09488198544
Mean3.20618332
Median Absolute Deviation (MAD)1
Skewness0.2739994538
Sum338387
Variance2.445592185
MonotonicityNot monotonic
2022-05-03T22:43:19.952045image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
442706
40.5%
122152
21.0%
315739
 
14.9%
212630
 
12.0%
56471
 
6.1%
75711
 
5.4%
6105
 
0.1%
-128
 
< 0.1%
ValueCountFrequency (%)
-128
 
< 0.1%
122152
21.0%
212630
 
12.0%
315739
 
14.9%
442706
40.5%
56471
 
6.1%
6105
 
0.1%
75711
 
5.4%
ValueCountFrequency (%)
75711
 
5.4%
6105
 
0.1%
56471
 
6.1%
442706
40.5%
315739
 
14.9%
212630
 
12.0%
122152
21.0%
-128
 
< 0.1%

perceived_colour_value_name
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
Dark
42706 
Dusty Light
22152 
Light
15739 
Medium Dusty
12630 
Bright
6471 
Other values (3)
5844 

Length

Max length12
Median length11
Mean length6.812302211
Min length4

Characters and Unicode

Total characters718984
Distinct characters23
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDark
2nd rowLight
3rd rowDusty Light
4th rowDark
5th rowLight

Common Values

ValueCountFrequency (%)
Dark42706
40.5%
Dusty Light22152
21.0%
Light15739
 
14.9%
Medium Dusty12630
 
12.0%
Bright6471
 
6.1%
Medium5711
 
5.4%
Undefined105
 
0.1%
Unknown28
 
< 0.1%

Length

2022-05-03T22:43:20.366321image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-03T22:43:20.502329image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
dark42706
30.4%
light37891
27.0%
dusty34782
24.8%
medium18341
13.1%
bright6471
 
4.6%
undefined105
 
0.1%
unknown28
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
t79144
11.0%
D77488
10.8%
i62808
 
8.7%
u53123
 
7.4%
r49177
 
6.8%
h44362
 
6.2%
g44362
 
6.2%
k42734
 
5.9%
a42706
 
5.9%
L37891
 
5.3%
Other values (13)185189
25.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter543878
75.6%
Uppercase Letter140324
 
19.5%
Space Separator34782
 
4.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t79144
14.6%
i62808
11.5%
u53123
9.8%
r49177
9.0%
h44362
8.2%
g44362
8.2%
k42734
7.9%
a42706
7.9%
y34782
6.4%
s34782
6.4%
Other values (7)55898
10.3%
Uppercase Letter
ValueCountFrequency (%)
D77488
55.2%
L37891
27.0%
M18341
 
13.1%
B6471
 
4.6%
U133
 
0.1%
Space Separator
ValueCountFrequency (%)
34782
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin684202
95.2%
Common34782
 
4.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
t79144
11.6%
D77488
11.3%
i62808
9.2%
u53123
 
7.8%
r49177
 
7.2%
h44362
 
6.5%
g44362
 
6.5%
k42734
 
6.2%
a42706
 
6.2%
L37891
 
5.5%
Other values (12)150407
22.0%
Common
ValueCountFrequency (%)
34782
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII718984
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t79144
11.0%
D77488
10.8%
i62808
 
8.7%
u53123
 
7.4%
r49177
 
6.8%
h44362
 
6.2%
g44362
 
6.2%
k42734
 
5.9%
a42706
 
5.9%
L37891
 
5.3%
Other values (13)185189
25.8%

perceived_colour_master_id
Real number (ℝ)

HIGH CORRELATION

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.807972182
Minimum-1
Maximum20
Zeros0
Zeros (%)0.0%
Negative685
Negative (%)0.6%
Memory size824.7 KiB
2022-05-03T22:43:20.618502image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile2
Q14
median5
Q311
95-th percentile19
Maximum20
Range21
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.376727008
Coefficient of variation (CV)0.6886201543
Kurtosis-0.3620404284
Mean7.807972182
Median Absolute Deviation (MAD)3
Skewness0.8013795222
Sum824069
Variance28.90919332
MonotonicityNot monotonic
2022-05-03T22:43:20.707646image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
522585
21.4%
218469
17.5%
912665
12.0%
49403
8.9%
128924
 
8.5%
185878
 
5.6%
115657
 
5.4%
193526
 
3.3%
203181
 
3.0%
83121
 
3.0%
Other values (10)12133
11.5%
ValueCountFrequency (%)
-1685
 
0.6%
11223
 
1.2%
218469
17.5%
32734
 
2.6%
49403
8.9%
522585
21.4%
61100
 
1.0%
71829
 
1.7%
83121
 
3.0%
912665
12.0%
ValueCountFrequency (%)
203181
 
3.0%
193526
 
3.3%
185878
5.6%
163
 
< 0.1%
152180
 
2.1%
14105
 
0.1%
132269
 
2.1%
128924
8.5%
115657
5.4%
105
 
< 0.1%

perceived_colour_master_name
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.2 MiB
Black
22585 
Blue
18469 
White
12665 
Pink
9403 
Grey
8924 
Other values (15)
33496 

Length

Max length15
Median length12
Mean length4.924608213
Min length3

Characters and Unicode

Total characters519753
Distinct characters33
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBlack
2nd rowWhite
3rd rowWhite
4th rowBlack
5th rowWhite

Common Values

ValueCountFrequency (%)
Black22585
21.4%
Blue18469
17.5%
White12665
12.0%
Pink9403
8.9%
Grey8924
 
8.5%
Red5878
 
5.6%
Beige5657
 
5.4%
Green3526
 
3.3%
Khaki green3181
 
3.0%
Yellow3121
 
3.0%
Other values (10)12133
11.5%

Length

2022-05-03T22:43:20.806466image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
black22585
20.6%
blue18469
16.8%
white12665
11.5%
pink9403
8.6%
grey8924
 
8.1%
green6715
 
6.1%
red5878
 
5.4%
beige5657
 
5.2%
khaki3181
 
2.9%
yellow3121
 
2.8%
Other values (11)13233
12.0%

Most occurring characters

ValueCountFrequency (%)
e83082
16.0%
l52912
 
10.2%
B48983
 
9.4%
k35854
 
6.9%
i33948
 
6.5%
a31780
 
6.1%
c23685
 
4.6%
r23571
 
4.5%
n23386
 
4.5%
u23335
 
4.5%
Other values (23)139217
26.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter408919
78.7%
Uppercase Letter106545
 
20.5%
Space Separator4289
 
0.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e83082
20.3%
l52912
12.9%
k35854
8.8%
i33948
8.3%
a31780
 
7.8%
c23685
 
5.8%
r23571
 
5.8%
n23386
 
5.7%
u23335
 
5.7%
h15854
 
3.9%
Other values (10)61512
15.0%
Uppercase Letter
ValueCountFrequency (%)
B48983
46.0%
W12665
 
11.9%
G12458
 
11.7%
P10503
 
9.9%
R5878
 
5.5%
M3403
 
3.2%
K3181
 
3.0%
Y3126
 
2.9%
O2734
 
2.6%
T1829
 
1.7%
Other values (2)1785
 
1.7%
Space Separator
ValueCountFrequency (%)
4289
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin515464
99.2%
Common4289
 
0.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e83082
16.1%
l52912
 
10.3%
B48983
 
9.5%
k35854
 
7.0%
i33948
 
6.6%
a31780
 
6.2%
c23685
 
4.6%
r23571
 
4.6%
n23386
 
4.5%
u23335
 
4.5%
Other values (22)134928
26.2%
Common
ValueCountFrequency (%)
4289
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII519753
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e83082
16.0%
l52912
 
10.2%
B48983
 
9.4%
k35854
 
6.9%
i33948
 
6.5%
a31780
 
6.1%
c23685
 
4.6%
r23571
 
4.5%
n23386
 
4.5%
u23335
 
4.5%
Other values (23)139217
26.8%

department_no
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct299
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4532.777833
Minimum1201
Maximum9989
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size824.7 KiB
2022-05-03T22:43:20.918304image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1201
5-th percentile1338
Q11676
median4222
Q37389
95-th percentile8748
Maximum9989
Range8788
Interquartile range (IQR)5713

Descriptive statistics

Standard deviation2712.692011
Coefficient of variation (CV)0.5984612773
Kurtosis-1.396426686
Mean4532.777833
Median Absolute Deviation (MAD)2556
Skewness0.2713538704
Sum478398438
Variance7358697.949
MonotonicityNot monotonic
2022-05-03T22:43:21.033365image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
76162032
 
1.9%
13381921
 
1.8%
87161874
 
1.8%
42421839
 
1.7%
76481488
 
1.4%
16401429
 
1.4%
16361402
 
1.3%
16761359
 
1.3%
13441354
 
1.3%
16431339
 
1.3%
Other values (289)89505
84.8%
ValueCountFrequency (%)
1201829
0.8%
120216
 
< 0.1%
1212299
 
0.3%
1222238
 
0.2%
124187
 
0.1%
1244667
0.6%
1310251
 
0.2%
1313630
0.6%
13221206
1.1%
1334864
0.8%
ValueCountFrequency (%)
9989122
 
0.1%
9986513
0.5%
9985579
0.5%
9984236
0.2%
902033
 
< 0.1%
8956363
0.3%
8917421
0.4%
8888269
0.3%
8852281
0.3%
881521
 
< 0.1%

department_name
Categorical

HIGH CARDINALITY

Distinct250
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.1 MiB
Jersey
 
4604
Knitwear
 
3503
Trouser
 
2655
Blouse
 
2362
Dress
 
2087
Other values (245)
90331 

Length

Max length40
Median length26
Mean length13.14021906
Min length2

Characters and Unicode

Total characters1386845
Distinct characters60
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowJersey Basic
2nd rowJersey Basic
3rd rowJersey Basic
4th rowClean Lingerie
5th rowClean Lingerie

Common Values

ValueCountFrequency (%)
Jersey4604
 
4.4%
Knitwear3503
 
3.3%
Trouser2655
 
2.5%
Blouse2362
 
2.2%
Dress2087
 
2.0%
Swimwear2075
 
2.0%
Kids Girl Jersey Fancy2032
 
1.9%
Expressive Lingerie1921
 
1.8%
Young Girl Jersey Fancy1874
 
1.8%
Jersey Fancy1754
 
1.7%
Other values (240)80675
76.4%

Length

2022-05-03T22:43:21.161649image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
jersey24170
 
10.5%
girl16349
 
7.1%
kids14307
 
6.2%
fancy13087
 
5.7%
boy11674
 
5.1%
young10428
 
4.5%
baby7973
 
3.5%
knitwear7498
 
3.2%
basic7078
 
3.1%
woven6640
 
2.9%
Other values (132)111638
48.4%

Most occurring characters

ValueCountFrequency (%)
e142984
 
10.3%
s126069
 
9.1%
125300
 
9.0%
r110268
 
8.0%
i87155
 
6.3%
o77105
 
5.6%
a65051
 
4.7%
y61342
 
4.4%
n54902
 
4.0%
c42943
 
3.1%
Other values (50)493726
35.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1018643
73.5%
Uppercase Letter229916
 
16.6%
Space Separator125300
 
9.0%
Other Punctuation9941
 
0.7%
Decimal Number2079
 
0.1%
Math Symbol615
 
< 0.1%
Dash Punctuation351
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e142984
14.0%
s126069
12.4%
r110268
10.8%
i87155
8.6%
o77105
 
7.6%
a65051
 
6.4%
y61342
 
6.0%
n54902
 
5.4%
c42943
 
4.2%
t41139
 
4.0%
Other values (16)209685
20.6%
Uppercase Letter
ValueCountFrequency (%)
B37938
16.5%
J27069
11.8%
K22977
10.0%
S21805
9.5%
G18285
8.0%
T14115
 
6.1%
F12780
 
5.6%
D12281
 
5.3%
W10518
 
4.6%
Y10428
 
4.5%
Other values (13)41720
18.1%
Decimal Number
ValueCountFrequency (%)
11748
84.1%
5202
 
9.7%
664
 
3.1%
264
 
3.1%
71
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/5742
57.8%
&4073
41.0%
.126
 
1.3%
Space Separator
ValueCountFrequency (%)
125300
100.0%
Math Symbol
ValueCountFrequency (%)
+615
100.0%
Dash Punctuation
ValueCountFrequency (%)
-351
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1248559
90.0%
Common138286
 
10.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e142984
 
11.5%
s126069
 
10.1%
r110268
 
8.8%
i87155
 
7.0%
o77105
 
6.2%
a65051
 
5.2%
y61342
 
4.9%
n54902
 
4.4%
c42943
 
3.4%
t41139
 
3.3%
Other values (39)439601
35.2%
Common
ValueCountFrequency (%)
125300
90.6%
/5742
 
4.2%
&4073
 
2.9%
11748
 
1.3%
+615
 
0.4%
-351
 
0.3%
5202
 
0.1%
.126
 
0.1%
664
 
< 0.1%
264
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1386845
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e142984
 
10.3%
s126069
 
9.1%
125300
 
9.0%
r110268
 
8.0%
i87155
 
6.3%
o77105
 
5.6%
a65051
 
4.7%
y61342
 
4.4%
n54902
 
4.0%
c42943
 
3.1%
Other values (50)493726
35.6%

index_code
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.8 MiB
A
26001 
D
15149 
F
12553 
H
12007 
I
9214 
Other values (5)
30618 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters105542
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowA
3rd rowA
4th rowB
5th rowB

Common Values

ValueCountFrequency (%)
A26001
24.6%
D15149
14.4%
F12553
11.9%
H12007
11.4%
I9214
 
8.7%
G8875
 
8.4%
C6961
 
6.6%
B6775
 
6.4%
J4615
 
4.4%
S3392
 
3.2%

Length

2022-05-03T22:43:21.269165image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-03T22:43:21.376724image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
a26001
24.6%
d15149
14.4%
f12553
11.9%
h12007
11.4%
i9214
 
8.7%
g8875
 
8.4%
c6961
 
6.6%
b6775
 
6.4%
j4615
 
4.4%
s3392
 
3.2%

Most occurring characters

ValueCountFrequency (%)
A26001
24.6%
D15149
14.4%
F12553
11.9%
H12007
11.4%
I9214
 
8.7%
G8875
 
8.4%
C6961
 
6.6%
B6775
 
6.4%
J4615
 
4.4%
S3392
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter105542
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A26001
24.6%
D15149
14.4%
F12553
11.9%
H12007
11.4%
I9214
 
8.7%
G8875
 
8.4%
C6961
 
6.6%
B6775
 
6.4%
J4615
 
4.4%
S3392
 
3.2%

Most occurring scripts

ValueCountFrequency (%)
Latin105542
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A26001
24.6%
D15149
14.4%
F12553
11.9%
H12007
11.4%
I9214
 
8.7%
G8875
 
8.4%
C6961
 
6.6%
B6775
 
6.4%
J4615
 
4.4%
S3392
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII105542
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A26001
24.6%
D15149
14.4%
F12553
11.9%
H12007
11.4%
I9214
 
8.7%
G8875
 
8.4%
C6961
 
6.6%
B6775
 
6.4%
J4615
 
4.4%
S3392
 
3.2%

index_name
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.1 MiB
Ladieswear
26001 
Divided
15149 
Menswear
12553 
Children Sizes 92-140
12007 
Children Sizes 134-170
9214 
Other values (5)
30618 

Length

Max length30
Median length21
Mean length13.76172519
Min length5

Characters and Unicode

Total characters1452440
Distinct characters41
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLadieswear
2nd rowLadieswear
3rd rowLadieswear
4th rowLingeries/Tights
5th rowLingeries/Tights

Common Values

ValueCountFrequency (%)
Ladieswear26001
24.6%
Divided15149
14.4%
Menswear12553
11.9%
Children Sizes 92-14012007
11.4%
Children Sizes 134-1709214
 
8.7%
Baby Sizes 50-988875
 
8.4%
Ladies Accessories6961
 
6.6%
Lingeries/Tights6775
 
6.4%
Children Accessories, Swimwear4615
 
4.4%
Sport3392
 
3.2%

Length

2022-05-03T22:43:21.494742image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-03T22:43:21.614269image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
sizes30096
16.5%
ladieswear26001
14.3%
children25836
14.2%
divided15149
8.3%
menswear12553
6.9%
92-14012007
 
6.6%
accessories11576
 
6.4%
134-1709214
 
5.1%
baby8875
 
4.9%
50-988875
 
4.9%
Other values (4)21743
12.0%

Most occurring characters

ValueCountFrequency (%)
e196467
 
13.5%
i155708
 
10.7%
s123889
 
8.5%
r90748
 
6.2%
d89096
 
6.1%
a85006
 
5.9%
76383
 
5.3%
w47784
 
3.3%
n45164
 
3.1%
L39737
 
2.7%
Other values (31)502458
34.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1025148
70.6%
Uppercase Letter158604
 
10.9%
Decimal Number150819
 
10.4%
Space Separator76383
 
5.3%
Dash Punctuation30096
 
2.1%
Other Punctuation11390
 
0.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e196467
19.2%
i155708
15.2%
s123889
12.1%
r90748
8.9%
d89096
8.7%
a85006
8.3%
w47784
 
4.7%
n45164
 
4.4%
h32611
 
3.2%
z30096
 
2.9%
Other values (10)128579
12.5%
Decimal Number
ValueCountFrequency (%)
130435
20.2%
030096
20.0%
421221
14.1%
920882
13.8%
212007
 
8.0%
39214
 
6.1%
79214
 
6.1%
88875
 
5.9%
58875
 
5.9%
Uppercase Letter
ValueCountFrequency (%)
L39737
25.1%
S38103
24.0%
C25836
16.3%
D15149
 
9.6%
M12553
 
7.9%
A11576
 
7.3%
B8875
 
5.6%
T6775
 
4.3%
Other Punctuation
ValueCountFrequency (%)
/6775
59.5%
,4615
40.5%
Space Separator
ValueCountFrequency (%)
76383
100.0%
Dash Punctuation
ValueCountFrequency (%)
-30096
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1183752
81.5%
Common268688
 
18.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e196467
16.6%
i155708
13.2%
s123889
10.5%
r90748
 
7.7%
d89096
 
7.5%
a85006
 
7.2%
w47784
 
4.0%
n45164
 
3.8%
L39737
 
3.4%
S38103
 
3.2%
Other values (18)272050
23.0%
Common
ValueCountFrequency (%)
76383
28.4%
130435
 
11.3%
030096
 
11.2%
-30096
 
11.2%
421221
 
7.9%
920882
 
7.8%
212007
 
4.5%
39214
 
3.4%
79214
 
3.4%
88875
 
3.3%
Other values (3)20265
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1452440
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e196467
 
13.5%
i155708
 
10.7%
s123889
 
8.5%
r90748
 
6.2%
d89096
 
6.1%
a85006
 
5.9%
76383
 
5.3%
w47784
 
3.3%
n45164
 
3.1%
L39737
 
2.7%
Other values (31)502458
34.6%

index_group_no
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.8 MiB
1
39737 
4
34711 
2
15149 
3
12553 
26
 
3392

Length

Max length2
Median length1
Mean length1.032138864
Min length1

Characters and Unicode

Total characters108934
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
139737
37.7%
434711
32.9%
215149
 
14.4%
312553
 
11.9%
263392
 
3.2%

Length

2022-05-03T22:43:21.734966image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-03T22:43:21.835455image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
139737
37.7%
434711
32.9%
215149
 
14.4%
312553
 
11.9%
263392
 
3.2%

Most occurring characters

ValueCountFrequency (%)
139737
36.5%
434711
31.9%
218541
17.0%
312553
 
11.5%
63392
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number108934
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
139737
36.5%
434711
31.9%
218541
17.0%
312553
 
11.5%
63392
 
3.1%

Most occurring scripts

ValueCountFrequency (%)
Common108934
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
139737
36.5%
434711
31.9%
218541
17.0%
312553
 
11.5%
63392
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII108934
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
139737
36.5%
434711
31.9%
218541
17.0%
312553
 
11.5%
63392
 
3.1%

index_group_name
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.8 MiB
Ladieswear
39737 
Baby/Children
34711 
Divided
15149 
Menswear
12553 
Sport
 
3392

Length

Max length13
Median length10
Mean length10.15747285
Min length5

Characters and Unicode

Total characters1072040
Distinct characters23
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLadieswear
2nd rowLadieswear
3rd rowLadieswear
4th rowLadieswear
5th rowLadieswear

Common Values

ValueCountFrequency (%)
Ladieswear39737
37.7%
Baby/Children34711
32.9%
Divided15149
 
14.4%
Menswear12553
 
11.9%
Sport3392
 
3.2%

Length

2022-05-03T22:43:21.938199image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-03T22:43:22.046099image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
ladieswear39737
37.7%
baby/children34711
32.9%
divided15149
 
14.4%
menswear12553
 
11.9%
sport3392
 
3.2%

Most occurring characters

ValueCountFrequency (%)
e154440
14.4%
a126738
11.8%
d104746
 
9.8%
i104746
 
9.8%
r90393
 
8.4%
s52290
 
4.9%
w52290
 
4.9%
n47264
 
4.4%
L39737
 
3.7%
C34711
 
3.2%
Other values (13)264685
24.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter897076
83.7%
Uppercase Letter140253
 
13.1%
Other Punctuation34711
 
3.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e154440
17.2%
a126738
14.1%
d104746
11.7%
i104746
11.7%
r90393
10.1%
s52290
 
5.8%
w52290
 
5.8%
n47264
 
5.3%
l34711
 
3.9%
h34711
 
3.9%
Other values (6)94747
10.6%
Uppercase Letter
ValueCountFrequency (%)
L39737
28.3%
C34711
24.7%
B34711
24.7%
D15149
 
10.8%
M12553
 
9.0%
S3392
 
2.4%
Other Punctuation
ValueCountFrequency (%)
/34711
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1037329
96.8%
Common34711
 
3.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e154440
14.9%
a126738
12.2%
d104746
10.1%
i104746
10.1%
r90393
 
8.7%
s52290
 
5.0%
w52290
 
5.0%
n47264
 
4.6%
L39737
 
3.8%
C34711
 
3.3%
Other values (12)229974
22.2%
Common
ValueCountFrequency (%)
/34711
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1072040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e154440
14.4%
a126738
11.8%
d104746
 
9.8%
i104746
 
9.8%
r90393
 
8.4%
s52290
 
4.9%
w52290
 
4.9%
n47264
 
4.4%
L39737
 
3.7%
C34711
 
3.2%
Other values (13)264685
24.7%

section_no
Real number (ℝ≥0)

HIGH CORRELATION

Distinct57
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean42.66421898
Minimum2
Maximum97
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size824.7 KiB
2022-05-03T22:43:22.158456image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile6
Q120
median46
Q361
95-th percentile77
Maximum97
Range95
Interquartile range (IQR)41

Descriptive statistics

Standard deviation23.26010496
Coefficient of variation (CV)0.5451899862
Kurtosis-1.100068347
Mean42.66421898
Median Absolute Deviation (MAD)20
Skewness-0.08453543188
Sum4502867
Variance541.0324827
MonotonicityNot monotonic
2022-05-03T22:43:22.273188image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
157295
 
6.9%
537124
 
6.7%
444932
 
4.7%
764469
 
4.2%
773899
 
3.7%
613598
 
3.4%
793490
 
3.3%
113376
 
3.2%
463328
 
3.2%
663270
 
3.1%
Other values (47)60761
57.6%
ValueCountFrequency (%)
22337
 
2.2%
43
 
< 0.1%
51894
 
1.8%
62725
 
2.6%
82266
 
2.1%
113376
3.2%
141270
 
1.2%
157295
6.9%
161581
 
1.5%
171
 
< 0.1%
ValueCountFrequency (%)
97559
 
0.5%
82682
 
0.6%
8035
 
< 0.1%
793490
3.3%
773899
3.7%
764469
4.2%
722034
1.9%
7126
 
< 0.1%
70280
 
0.3%
663270
3.1%

section_name
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct56
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size7.4 MiB
Womens Everyday Collection
 
7295
Divided Collection
 
7124
Baby Essentials & Complements
 
4932
Kids Girl
 
4469
Young Girl
 
3899
Other values (51)
77823 

Length

Max length30
Median length22
Mean length16.74306911
Min length4

Characters and Unicode

Total characters1767097
Distinct characters48
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWomens Everyday Basics
2nd rowWomens Everyday Basics
3rd rowWomens Everyday Basics
4th rowWomens Lingerie
5th rowWomens Lingerie

Common Values

ValueCountFrequency (%)
Womens Everyday Collection7295
 
6.9%
Divided Collection7124
 
6.7%
Baby Essentials & Complements4932
 
4.7%
Kids Girl4469
 
4.2%
Young Girl3899
 
3.7%
Womens Lingerie3598
 
3.4%
Girls Underwear & Basics3490
 
3.3%
Womens Tailoring3376
 
3.2%
Kids Boy3328
 
3.2%
Womens Small accessories3270
 
3.1%
Other values (46)60761
57.6%

Length

2022-05-03T22:43:22.384859image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
womens33662
 
12.8%
17323
 
6.6%
kids15153
 
5.8%
collection14419
 
5.5%
divided14275
 
5.4%
baby10551
 
4.0%
girl10128
 
3.9%
accessories9735
 
3.7%
everyday8876
 
3.4%
basics8828
 
3.4%
Other values (49)120028
45.6%

Most occurring characters

ValueCountFrequency (%)
e182527
 
10.3%
157436
 
8.9%
s142303
 
8.1%
i130588
 
7.4%
o123340
 
7.0%
n99911
 
5.7%
r93569
 
5.3%
a92150
 
5.2%
l72523
 
4.1%
d67367
 
3.8%
Other values (38)605383
34.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1336032
75.6%
Uppercase Letter243540
 
13.8%
Space Separator157436
 
8.9%
Other Punctuation27562
 
1.6%
Math Symbol2337
 
0.1%
Decimal Number190
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e182527
13.7%
s142303
10.7%
i130588
9.8%
o123340
9.2%
n99911
 
7.5%
r93569
 
7.0%
a92150
 
6.9%
l72523
 
5.4%
d67367
 
5.0%
m63470
 
4.8%
Other values (12)268284
20.1%
Uppercase Letter
ValueCountFrequency (%)
W33662
13.8%
B30475
12.5%
C29740
12.2%
S22980
9.4%
D17628
7.2%
M15966
 
6.6%
K15153
 
6.2%
E14164
 
5.8%
G13618
 
5.6%
T8992
 
3.7%
Other values (11)41162
16.9%
Other Punctuation
ValueCountFrequency (%)
&22426
81.4%
,5136
 
18.6%
Space Separator
ValueCountFrequency (%)
157436
100.0%
Math Symbol
ValueCountFrequency (%)
+2337
100.0%
Decimal Number
ValueCountFrequency (%)
2190
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1579572
89.4%
Common187525
 
10.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e182527
 
11.6%
s142303
 
9.0%
i130588
 
8.3%
o123340
 
7.8%
n99911
 
6.3%
r93569
 
5.9%
a92150
 
5.8%
l72523
 
4.6%
d67367
 
4.3%
m63470
 
4.0%
Other values (33)511824
32.4%
Common
ValueCountFrequency (%)
157436
84.0%
&22426
 
12.0%
,5136
 
2.7%
+2337
 
1.2%
2190
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1767097
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e182527
 
10.3%
157436
 
8.9%
s142303
 
8.1%
i130588
 
7.4%
o123340
 
7.0%
n99911
 
5.7%
r93569
 
5.3%
a92150
 
5.2%
l72523
 
4.1%
d67367
 
3.8%
Other values (38)605383
34.3%

garment_group_no
Real number (ℝ≥0)

HIGH CORRELATION

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1010.43829
Minimum1001
Maximum1025
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size824.7 KiB
2022-05-03T22:43:22.481424image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1001
5-th percentile1002
Q11005
median1009
Q31017
95-th percentile1020
Maximum1025
Range24
Interquartile range (IQR)12

Descriptive statistics

Standard deviation6.731023182
Coefficient of variation (CV)0.006661488632
Kurtosis-1.287044974
Mean1010.43829
Median Absolute Deviation (MAD)6
Skewness0.3187516231
Sum106643678
Variance45.30667307
MonotonicityNot monotonic
2022-05-03T22:43:22.576238image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
100521445
20.3%
101911519
10.9%
10028126
 
7.7%
10037490
 
7.1%
10177441
 
7.1%
10096727
 
6.4%
10105838
 
5.5%
10205145
 
4.9%
10134874
 
4.6%
10074501
 
4.3%
Other values (11)22436
21.3%
ValueCountFrequency (%)
10013873
 
3.7%
10028126
 
7.7%
10037490
 
7.1%
100521445
20.3%
10061965
 
1.9%
10074501
 
4.3%
1008908
 
0.9%
10096727
 
6.4%
10105838
 
5.5%
10112116
 
2.0%
ValueCountFrequency (%)
10251559
 
1.5%
10231061
 
1.0%
10212272
 
2.2%
10205145
4.9%
101911519
10.9%
10182787
 
2.6%
10177441
7.1%
10163100
 
2.9%
10141541
 
1.5%
10134874
4.6%

garment_group_name
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.8 MiB
Jersey Fancy
21445 
Accessories
11519 
Jersey Basic
8126 
Knitwear
7490 
Under-, Nightwear
7441 
Other values (16)
49521 

Length

Max length29
Median length17
Mean length10.95181065
Min length5

Characters and Unicode

Total characters1155876
Distinct characters40
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJersey Basic
2nd rowJersey Basic
3rd rowJersey Basic
4th rowUnder-, Nightwear
5th rowUnder-, Nightwear

Common Values

ValueCountFrequency (%)
Jersey Fancy21445
20.3%
Accessories11519
10.9%
Jersey Basic8126
 
7.7%
Knitwear7490
 
7.1%
Under-, Nightwear7441
 
7.1%
Trousers6727
 
6.4%
Blouses5838
 
5.5%
Shoes5145
 
4.9%
Dresses Ladies4874
 
4.6%
Outdoor4501
 
4.3%
Other values (11)22436
21.3%

Length

2022-05-03T22:43:22.678036image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
jersey29571
18.3%
fancy21445
13.3%
accessories11519
 
7.1%
trousers9827
 
6.1%
basic8126
 
5.0%
knitwear7490
 
4.6%
under7441
 
4.6%
nightwear7441
 
4.6%
blouses5838
 
3.6%
shoes5145
 
3.2%
Other values (20)47761
29.6%

Most occurring characters

ValueCountFrequency (%)
e160751
13.9%
s150245
13.0%
r108764
 
9.4%
i59052
 
5.1%
a57461
 
5.0%
n57297
 
5.0%
56062
 
4.9%
c55942
 
4.8%
y54946
 
4.8%
o51000
 
4.4%
Other values (30)344356
29.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter918164
79.4%
Uppercase Letter161297
 
14.0%
Space Separator56062
 
4.9%
Other Punctuation12912
 
1.1%
Dash Punctuation7441
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e160751
17.5%
s150245
16.4%
r108764
11.8%
i59052
 
6.4%
a57461
 
6.3%
n57297
 
6.2%
c55942
 
6.1%
y54946
 
6.0%
o51000
 
5.6%
t32104
 
3.5%
Other values (13)130602
14.2%
Uppercase Letter
ValueCountFrequency (%)
J31536
19.6%
F21445
13.3%
S17735
11.0%
B15929
9.9%
T12099
 
7.5%
A11519
 
7.1%
U11314
 
7.0%
D10423
 
6.5%
K9455
 
5.9%
N7441
 
4.6%
Other values (3)12401
 
7.7%
Other Punctuation
ValueCountFrequency (%)
,7441
57.6%
/5471
42.4%
Space Separator
ValueCountFrequency (%)
56062
100.0%
Dash Punctuation
ValueCountFrequency (%)
-7441
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1079461
93.4%
Common76415
 
6.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e160751
14.9%
s150245
13.9%
r108764
 
10.1%
i59052
 
5.5%
a57461
 
5.3%
n57297
 
5.3%
c55942
 
5.2%
y54946
 
5.1%
o51000
 
4.7%
t32104
 
3.0%
Other values (26)291899
27.0%
Common
ValueCountFrequency (%)
56062
73.4%
,7441
 
9.7%
-7441
 
9.7%
/5471
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII1155876
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e160751
13.9%
s150245
13.0%
r108764
 
9.4%
i59052
 
5.1%
a57461
 
5.0%
n57297
 
5.0%
56062
 
4.9%
c55942
 
4.8%
y54946
 
4.8%
o51000
 
4.4%
Other values (30)344356
29.8%

detail_desc
Categorical

HIGH CARDINALITY

Distinct43404
Distinct (%)41.3%
Missing416
Missing (%)0.4%
Memory size20.9 MiB
T-shirt in printed cotton jersey.
 
159
Leggings in soft organic cotton jersey with an elasticated waist.
 
138
T-shirt in soft, printed cotton jersey.
 
137
Socks in a soft, jacquard-knit cotton blend with elasticated tops.
 
136
Fine-knit trainer socks in a soft cotton blend with elasticated tops.
 
134
Other values (43399)
104422 

Length

Max length764
Median length468
Mean length142.161901
Min length11

Characters and Unicode

Total characters14944912
Distinct characters98
Distinct categories14 ?
Distinct scripts2 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique21430 ?
Unique (%)20.4%

Sample

1st rowJersey top with narrow shoulder straps.
2nd rowJersey top with narrow shoulder straps.
3rd rowJersey top with narrow shoulder straps.
4th rowMicrofibre T-shirt bra with underwired, moulded, lightly padded cups that shape the bust and provide good support. Narrow adjustable shoulder straps and a narrow hook-and-eye fastening at the back. Without visible seams for greater comfort.
5th rowMicrofibre T-shirt bra with underwired, moulded, lightly padded cups that shape the bust and provide good support. Narrow adjustable shoulder straps and a narrow hook-and-eye fastening at the back. Without visible seams for greater comfort.

Common Values

ValueCountFrequency (%)
T-shirt in printed cotton jersey.159
 
0.2%
Leggings in soft organic cotton jersey with an elasticated waist.138
 
0.1%
T-shirt in soft, printed cotton jersey.137
 
0.1%
Socks in a soft, jacquard-knit cotton blend with elasticated tops.136
 
0.1%
Fine-knit trainer socks in a soft cotton blend with elasticated tops.134
 
0.1%
Socks in a soft, fine-knit cotton blend with elasticated tops.118
 
0.1%
Sunglasses with plastic frames and UV-protective, tinted lenses.117
 
0.1%
Boxer shorts in a cotton weave with an elasticated waist, long legs and button fly.104
 
0.1%
Tights in a soft, fine-knit cotton blend with an elasticated waist.97
 
0.1%
Fine-knit socks in a soft cotton blend.97
 
0.1%
Other values (43394)103889
98.4%
(Missing)416
 
0.4%

Length

2022-05-03T22:43:22.806746image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
and160065
 
6.4%
a151693
 
6.0%
with150703
 
6.0%
the135045
 
5.4%
in105374
 
4.2%
at80688
 
3.2%
back36807
 
1.5%
front36244
 
1.4%
soft35579
 
1.4%
waist34284
 
1.4%
Other values (5000)1586260
63.1%

Most occurring characters

ValueCountFrequency (%)
2407661
16.1%
e1318549
 
8.8%
t1241876
 
8.3%
a1029247
 
6.9%
n910904
 
6.1%
i876105
 
5.9%
s828095
 
5.5%
o718446
 
4.8%
r618196
 
4.1%
d602822
 
4.0%
Other values (88)4393011
29.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter11792249
78.9%
Space Separator2407661
 
16.1%
Other Punctuation358067
 
2.4%
Uppercase Letter234150
 
1.6%
Dash Punctuation111399
 
0.7%
Decimal Number35387
 
0.2%
Open Punctuation2083
 
< 0.1%
Close Punctuation2082
 
< 0.1%
Other Symbol977
 
< 0.1%
Other Number444
 
< 0.1%
Other values (4)413
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1318549
11.2%
t1241876
 
10.5%
a1029247
 
8.7%
n910904
 
7.7%
i876105
 
7.4%
s828095
 
7.0%
o718446
 
6.1%
r618196
 
5.2%
d602822
 
5.1%
h543005
 
4.6%
Other values (21)3105004
26.3%
Uppercase Letter
ValueCountFrequency (%)
S50792
21.7%
L27632
11.8%
T24427
 
10.4%
F12014
 
5.1%
C11379
 
4.9%
V11244
 
4.8%
U8860
 
3.8%
P8723
 
3.7%
B8245
 
3.5%
J8091
 
3.5%
Other values (17)62743
26.8%
Other Punctuation
ValueCountFrequency (%)
.207391
57.9%
,147684
41.2%
/2255
 
0.6%
&412
 
0.1%
%199
 
0.1%
:57
 
< 0.1%
'42
 
< 0.1%
"12
 
< 0.1%
!10
 
< 0.1%
?5
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
58514
24.1%
34986
14.1%
14894
13.8%
44657
13.2%
23788
10.7%
02949
 
8.3%
81849
 
5.2%
61498
 
4.2%
71293
 
3.7%
9959
 
2.7%
Dash Punctuation
ValueCountFrequency (%)
-109665
98.4%
1730
 
1.6%
4
 
< 0.1%
Other Symbol
ValueCountFrequency (%)
847
86.7%
®117
 
12.0%
°13
 
1.3%
Math Symbol
ValueCountFrequency (%)
+10
83.3%
>1
 
8.3%
<1
 
8.3%
Open Punctuation
ValueCountFrequency (%)
(2082
> 99.9%
{1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
)2081
> 99.9%
}1
 
< 0.1%
Final Punctuation
ValueCountFrequency (%)
384
97.0%
12
 
3.0%
Initial Punctuation
ValueCountFrequency (%)
3
75.0%
1
 
25.0%
Space Separator
ValueCountFrequency (%)
2407661
100.0%
Other Number
ValueCountFrequency (%)
½444
100.0%
Modifier Symbol
ValueCountFrequency (%)
´1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin12026399
80.5%
Common2918513
 
19.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1318549
 
11.0%
t1241876
 
10.3%
a1029247
 
8.6%
n910904
 
7.6%
i876105
 
7.3%
s828095
 
6.9%
o718446
 
6.0%
r618196
 
5.1%
d602822
 
5.0%
h543005
 
4.5%
Other values (48)3339154
27.8%
Common
ValueCountFrequency (%)
2407661
82.5%
.207391
 
7.1%
,147684
 
5.1%
-109665
 
3.8%
58514
 
0.3%
34986
 
0.2%
14894
 
0.2%
44657
 
0.2%
23788
 
0.1%
02949
 
0.1%
Other values (30)16324
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII14936558
99.9%
None5372
 
< 0.1%
Punctuation2134
 
< 0.1%
Letterlike Symbols847
 
< 0.1%
Alphabetic PF1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2407661
16.1%
e1318549
 
8.8%
t1241876
 
8.3%
a1029247
 
6.9%
n910904
 
6.1%
i876105
 
5.9%
s828095
 
5.5%
o718446
 
4.8%
r618196
 
4.1%
d602822
 
4.0%
Other values (71)4384657
29.4%
None
ValueCountFrequency (%)
é2476
46.1%
ê2210
41.1%
½444
 
8.3%
®117
 
2.2%
É102
 
1.9%
°13
 
0.2%
ñ6
 
0.1%
à3
 
0.1%
´1
 
< 0.1%
Punctuation
ValueCountFrequency (%)
1730
81.1%
384
 
18.0%
12
 
0.6%
4
 
0.2%
3
 
0.1%
1
 
< 0.1%
Letterlike Symbols
ValueCountFrequency (%)
847
100.0%
Alphabetic PF
ValueCountFrequency (%)
1
100.0%

Interactions

2022-05-03T22:43:15.026968image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:04.027548image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:05.229575image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:06.402414image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:07.847889image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:09.004941image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:10.139572image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:11.425748image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:12.546422image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:13.920554image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:15.134686image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:04.172750image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:05.346330image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:06.524913image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:07.964729image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:09.118936image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:10.263773image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:11.542808image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:12.650188image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:14.029032image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:15.243140image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:04.287446image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:05.460015image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:06.657506image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:08.081022image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:09.230370image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:10.391462image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:11.650686image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:12.755323image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:14.138782image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:15.362786image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:04.412923image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:05.581529image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:06.796773image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:08.200535image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:09.350803image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:10.521739image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:11.766831image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:13.105970image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:14.256133image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:15.476022image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:04.533701image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:05.698418image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:06.921321image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:08.314290image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:09.471006image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:10.643100image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:11.876859image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:13.222094image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:14.369681image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:15.589208image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:04.650936image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:05.812717image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:07.257031image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:08.422753image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:09.576485image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:10.771533image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:11.989068image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:13.345559image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:14.475344image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:15.715937image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:04.780642image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:05.940100image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:07.383545image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:08.544059image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:09.696735image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:10.924134image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:12.108837image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:13.469813image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:14.592436image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:15.826394image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:04.892810image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:06.056031image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:07.501936image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:08.659364image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:09.804716image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:11.043553image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:12.217036image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:13.587457image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:14.699830image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:15.932154image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:05.004975image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:06.170386image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:07.617790image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:08.773206image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:09.910371image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:11.161710image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:12.329130image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:13.697786image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:14.811265image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:16.048323image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:05.115850image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:06.286077image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:07.730207image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:08.884455image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:10.023203image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:11.295535image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:12.438480image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:13.811222image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-03T22:43:14.916978image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-05-03T22:43:22.919085image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-05-03T22:43:23.074548image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-05-03T22:43:23.242826image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-05-03T22:43:23.408960image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-05-03T22:43:23.582703image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-05-03T22:43:16.411431image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-05-03T22:43:17.298589image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-05-03T22:43:17.848021image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

article_idproduct_codeprod_nameproduct_type_noproduct_type_nameproduct_group_namegraphical_appearance_nographical_appearance_namecolour_group_codecolour_group_nameperceived_colour_value_idperceived_colour_value_nameperceived_colour_master_idperceived_colour_master_namedepartment_nodepartment_nameindex_codeindex_nameindex_group_noindex_group_namesection_nosection_namegarment_group_nogarment_group_namedetail_desc
0108775015108775Strap top253Vest topGarment Upper body1010016Solid9Black4Dark5Black1676Jersey BasicALadieswear1Ladieswear16Womens Everyday Basics1002Jersey BasicJersey top with narrow shoulder straps.
1108775044108775Strap top253Vest topGarment Upper body1010016Solid10White3Light9White1676Jersey BasicALadieswear1Ladieswear16Womens Everyday Basics1002Jersey BasicJersey top with narrow shoulder straps.
2108775051108775Strap top (1)253Vest topGarment Upper body1010017Stripe11Off White1Dusty Light9White1676Jersey BasicALadieswear1Ladieswear16Womens Everyday Basics1002Jersey BasicJersey top with narrow shoulder straps.
3110065001110065OP T-shirt (Idro)306BraUnderwear1010016Solid9Black4Dark5Black1339Clean LingerieBLingeries/Tights1Ladieswear61Womens Lingerie1017Under-, NightwearMicrofibre T-shirt bra with underwired, moulded, lightly padded cups that shape the bust and provide good support. Narrow adjustable shoulder straps and a narrow hook-and-eye fastening at the back. Without visible seams for greater comfort.
4110065002110065OP T-shirt (Idro)306BraUnderwear1010016Solid10White3Light9White1339Clean LingerieBLingeries/Tights1Ladieswear61Womens Lingerie1017Under-, NightwearMicrofibre T-shirt bra with underwired, moulded, lightly padded cups that shape the bust and provide good support. Narrow adjustable shoulder straps and a narrow hook-and-eye fastening at the back. Without visible seams for greater comfort.
5110065011110065OP T-shirt (Idro)306BraUnderwear1010016Solid12Light Beige1Dusty Light11Beige1339Clean LingerieBLingeries/Tights1Ladieswear61Womens Lingerie1017Under-, NightwearMicrofibre T-shirt bra with underwired, moulded, lightly padded cups that shape the bust and provide good support. Narrow adjustable shoulder straps and a narrow hook-and-eye fastening at the back. Without visible seams for greater comfort.
611156500111156520 den 1p Stockings304Underwear TightsSocks & Tights1010016Solid9Black4Dark5Black3608Tights basicBLingeries/Tights1Ladieswear62Womens Nightwear, Socks & Tigh1021Socks and TightsSemi shiny nylon stockings with a wide, reinforced trim at the top. Use with a suspender belt. 20 denier.
711156500311156520 den 1p Stockings302SocksSocks & Tights1010016Solid13Beige2Medium Dusty11Beige3608Tights basicBLingeries/Tights1Ladieswear62Womens Nightwear, Socks & Tigh1021Socks and TightsSemi shiny nylon stockings with a wide, reinforced trim at the top. Use with a suspender belt. 20 denier.
8111586001111586Shape Up 30 den 1p Tights273Leggings/TightsGarment Lower body1010016Solid9Black4Dark5Black3608Tights basicBLingeries/Tights1Ladieswear62Womens Nightwear, Socks & Tigh1021Socks and TightsTights with built-in support to lift the bottom. Black in 30 denier and light amber in 15 denier.
9111593001111593Support 40 den 1p Tights304Underwear TightsSocks & Tights1010016Solid9Black4Dark5Black3608Tights basicBLingeries/Tights1Ladieswear62Womens Nightwear, Socks & Tigh1021Socks and TightsSemi shiny tights that shape the tummy, thighs and calves while also encouraging blood circulation in the legs. Elasticated waist.

Last rows

article_idproduct_codeprod_nameproduct_type_noproduct_type_nameproduct_group_namegraphical_appearance_nographical_appearance_namecolour_group_codecolour_group_nameperceived_colour_value_idperceived_colour_value_nameperceived_colour_master_idperceived_colour_master_namedepartment_nodepartment_nameindex_codeindex_nameindex_group_noindex_group_namesection_nosection_namegarment_group_nogarment_group_namedetail_desc
105532949594001949594LOGG Elvis jogger.272TrousersGarment Lower body1010016Solid8Dark Grey4Dark-1Unknown1919JerseyALadieswear1Ladieswear2H&M+1005Jersey FancyJoggers in soft sweatshirt fabric with an elasticated, drawstring waist, diagonal side pockets and slim legs with ribbed hems.
105533950449002950449Compact brush Fancy78Other accessoriesAccessories1010016Solid50Other Pink5Bright4Pink4313Girls Small Acc/BagsJChildren Accessories, Swimwear4Baby/Children43Kids Accessories, Swimwear & D1019AccessoriesSmall, folding hair brush with a rhinestone-decorated lid that has a mirror inside. Diameter 6.5 cm.
105534952267001952267Heavy plain overknee tights 1p304Underwear TightsSocks & Tights1010013Other pattern9Black4Dark5Black3608Tights basicBLingeries/Tights1Ladieswear62Womens Nightwear, Socks & Tigh1021Socks and TightsFine-knit tights with an elasticated waist that are thinner at the top and more opaque at the bottom giving them the appearance of over-the-knee socks.
105535952937003952937Jets dress265DressGarment Full body1010001All over pattern13Beige2Medium Dusty1Mole1641JerseyALadieswear1Ladieswear18Womens Trend1005Jersey FancyFitted, calf-length dress in viscose jersey with a stand-up collar and concealed zip at the back. Double layer at the top with wrapover, draped sections, close-fitting, extra-long sleeves and an asymmetric skirt with a high slit in one side. Lined.
105536952938001952938Elton top254TopGarment Upper body1010001All over pattern13Beige2Medium Dusty1Mole1641JerseyALadieswear1Ladieswear18Womens Trend1005Jersey FancyFitted top in jersey with a round neckline and extra-long sleeves. Additional draped layer at the front.
1055379534500019534505pk regular Placement1302SocksSocks & Tights1010014Placement print9Black4Dark5Black7188Socks BinFMenswear3Menswear26Men Underwear1021Socks and TightsSocks in a fine-knit cotton blend with a small motif at the top and elasticated tops.
105538953763001953763SPORT Malaga tank253Vest topGarment Upper body1010016Solid9Black4Dark5Black1919JerseyALadieswear1Ladieswear2H&M+1005Jersey FancyLoose-fitting sports vest top in ribbed fast-drying functional fabric made from recycled polyester with a racer back and rounded hem.
105539956217002956217Cartwheel dress265DressGarment Full body1010016Solid9Black4Dark5Black1641JerseyALadieswear1Ladieswear18Womens Trend1005Jersey FancyShort, A-line dress in jersey with a round neckline and V-shaped opening at the front with narrow ties. Long, voluminous raglan sleeves and wide cuffs with covered buttons.
105540957375001957375CLAIRE HAIR CLAW72Hair clipAccessories1010016Solid9Black4Dark5Black3946Small AccessoriesDDivided2Divided52Divided Accessories1019AccessoriesLarge plastic hair claw.
105541959461001959461Lounge dress265DressGarment Full body1010016Solid11Off White1Dusty Light9White1641JerseyALadieswear1Ladieswear18Womens Trend1005Jersey FancyCalf-length dress in ribbed jersey made from a cotton blend. Low-cut V-neck at the back, dropped shoulders and long, wide sleeves that taper to the cuffs. Unlined.